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ABSTRACT 



This study was conducted to demonstrate that a 



context-dependent discrimination can produce learning difficulty in 
pseudo- spectrogram reading task and to look at what contribution 
segmentation makes to that difficulty. Experiment one involved 10 
subjects recruited from the University of Pittsburgh, who were shown 
pseudo-spectrogram patterns and then asked to respond by making 
selections from a screen menu on a computer. Results indicated that 
context-dependent discrimination can be difficult to learn. 
Experiment two was to try to determine whether the learning 
difficulty observed in experiment one was due to context-dependent 
segmentation, to some other factor such as salience or ta~'< demands, 
or to some interaction of these factors. Fifteen other subjects were 
given a stack of 32 different patterns and asked to circle the 
important parts. Results indicated that lack of salience may play an 
important part in making this type of skill difficult to learn. 
Results of the two experiments point to several factors which can 
affect the difficulty of learning to read speech spectrograms. 
Learning difficulty may be affected by the interaction of 
segmentation with cue salience and task demands. The main conclusion 
was to confirm the influence of segmentation on learning difficulty 
in speech spectrogram reading. (Two figures are included, and 19 
references are attached.) (MG) 
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Abstract 



This work examines possible sources of training difficulty encountered 
by learners of speech spectrogram reading. Such difficulty has been 
attributed to the context-dependent nature of the visual segmentation 
of spectrogram patterns {liberman et al, I968)» and suggestions by 
researchers of other difficult skills (Biederman & Shiffrar, 1983) have 
also implicated visual segmentation. In both cases* the discriminations 
necessary to distinguish important parts can be easily made once 
identifted, but are enormously difficult to discover. The experiments 
presented here used a pseudo-spectrogram reading task which varied 
the segmentation rules subjects were required to discover. Experiment 
I found that considerable learning difficulty could be produced by this 
task, but confounded the source of that difficulty among several 
factors. The second experiment attempted to identify the sources of the 
difficulty. Segmentation was found to contribute significantly. The 
salience of the important cues, and, potentially, the demands of the 
learning task were also found to increase the difficulty of discovering 
Important visual distinctions. These results are discussed with respect 
to the skill of spectrogram reading and theories of perceptual attention 
learning. 
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DS£Sciilty In Learning to Read Speech Spectrograms: 
The Rote of T^ual Segmentation 

When acquiring a new perceptual skill, a learner is usually 
faced with the problem of learning to recognize new features and 
discovering which combinations of features form meaningful patterns. 
In X-ray reading, for example, a student must learn which features 
indicate normal tissue and which indicate diseased tissue. Such 
learning is cognitive: the visual system breaks up the visual array into 
parts and recognition responses occur to learned features, but 
cognitive processing and training are required to make decisions about 
which parts are Important and how they combine to form hl^er-level 
patterns. 

Theories of perceptual learning have characterized this cognitive 
processing as an hypothesis-and-test procedure (Levlne. 1975; 
Trabasso & Bower, 1968) which results in the building of pattern 
detectors (Kahneman. 1973: Chase & Simon. 1973). More recently, 
attention has focused on the types of preferences or heuristics which 
may be required to constrain hypothesis search in complex displays 
(Michalski. 1983: Medin. Wattenmaker & Michalski. 1987). One type of 
constraint the cognitive system must make is where to draw object 
boundaries. i.e., which parts belong together as objects. Characteristics 
such as spatial relations, overlap, proximity, and shading differences 
may play a role In determining object coherence fIViesman. 1988). For 
certain perceptual skills, however, such segmentation decisions can 
create difficulties. For example, in x-ray pictures, brightness 
corresponds to the density of tissue rather than any reflective property 
(Squire, 1988). Hence, visual contours and separations may not 
correspond to organ or tissue boundaries. For example, if two organs 
of equal density abut, no contour will appear between them. A 
radiology student needs to learn a new way of segmenting an x-ray 
picture to identify the locations of organs and other tissue groups. 

The present research is concerned with learning difllculties 
which may result when visual segmentation does not correspond to 
object segmentation. Its focus is on a skill which until recently was 
considered extremely difiicult if not impossible to learn: speech 
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spectrogram reading. Much of the difficulty in spectrogram reading has 
been attributed to problems in segmenting the display. The goal of this 
research was, first, to show that learning difBculty could be produced 
by violating segmentation assumpUons. and. second, to look at how 
segmentation interacts with other stimulus and task variables. 

A speech spectrogram is a graph of the energy in different 
frequency components of speech over a short sampled time. Its two 
axes represent frequency and time, and the darkness of a small region 
represents the amount of sound energy at the frequency and time 
matching Its coordinates. When real-time spectrographic displays were 
first developed. It was hoped that people. especlaUy the hearing 
Impaired, could be taught to recognize speech by seeing It However, 
learning to idenUfy speech from this graphical display has proven to be 
difficult, requiring both an understanding of acoustic-phonetics and 
many hours of practice. Potter. Kopp, and Green (1947). in one of the 
earliest efforts toward such training, taught a ^oup of subjects to 
Identify important acoustic features in spectrograms and then had 
them tiy to commimicate with each other using a ' real-time 
spectrogmphlc display. They found that the time to learn the most 
common words spoken by a single person increased linearly with 
practice, at the rate of about 4 words per hour. That is, prior learning 
did not aid the learning of new words. A similar learning rate was 
found by Greene, Hsoni. and Carrell (1984), who had naive subjects 
learn to identify spectrograms of 50 words made by a single speaker. 
The subjects began with four words and were gradually given 
additional sets of four words over 22 sessions. After about 13 sessions 
the subjects were able to learn the new items with few errors and 
show a fair amount of transfer to a new list of words by the same 
speaker (91.3%) and the original word list spoken by a different 
speaker (76%). These studies have been viewed optimistically as 
demonstrating that people can be trained to recognize visual speech. 
However, the studies are limited by their use of speech from a single 
speaker, or by their focus on learning of individual words which would 
not generalize well to continuous speech. 

More impressive has been the effort of Dr. Victor Zue. who has 
taught himself to read spectrograms of continuous speech. 
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Independent of speaker, with a high level of accuraQr (Cole. Rudnicky. 
Zue. & Reddy. 1980). Zue systematically studied spectrogram pattevns 
for one hour per day over several years. This extensive practice along 
with his expertise in acoustic-phonetics has discovered features and 
rules which enable him to identify phoneme segments with an 
accuracy of about 85%. The features Zue uses are spectial pattern 5 
unique to individual phonemes, but he augments simple detection of 
these fixtures with knowledge of coarticulation effects, which can 
distort the features, and a knowledge of phonotactic constraints in the 
En^h language. Zue has also been successful in identifying the rules 
he uses to recognize phonemes and in teaching others to use the rules 
to read spectrograms with much less practice (40 hrs vs 2000 hrs) 
(Cole & Zue, 1980). 

But what Is the original source of the difficulty which limited 
subjects in early studies to small vocabularies, and required 2000+ 
hours of training plus acoustic-phonetic knowledge on the part of 
Victor Zue? In an article entitled "Why are speech spectrograms hard 
to read?". Liberman et al (1968) identify the major reason for this 
learning difficulty as the context-dependent nature of the acoustic 
signal. How a sound is articulated, and hence how it appears on a 
spectrogram, depends on what other sounds are made immediately 
before and after it. A vowel following a /d/ will look different from one 
following a /g/. Context dependency leads to a special learning 
difficulty because of the inherent difference between the way the visual 
and auditory systems segment the acousUc pattern. To the visual 
system, a vowel followed by a stop consonant appears as a vrtde dark 
band beside a narrow dark band with a blank space in between, i.e.. 
two distinct objects. However, the auditory segmentation of those two 
sounds is more overlapping and blurred; part of the stop sound is due 
to the vowel transition. Uberman et al (1968) saw this difference 
between the auditory and visual systems as so fundamental that they 
asserted "no amount of training will cause an appropriate speech 
decoder to develop for a visual input" (p. 131). Victor Zue has proven 
their appraisal wrong, but he has also shown that their analysis of the 
source of difficulty may be correct: much of his ability is based on his 
knowledge of coarticulation (context-dependent) effects. 
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Why should context dependency and its associated segmentation 
problem produce learning difficulties? According to Ubemian et al 
(1968). the nature of the speech code Is such that while the auditory 
system has developed to deal with its temporal properties, the visual 
system is not capable of processing It in a spatial layout. Yet Victor 
Zue's performance demonstrates that it can be accomplished. The 
quesuon then is why. from a perceptual learning point of view, 
context-dependent features are difficult to identify. One suggestion 
comes from recent work by Biederman and Shiffrar (1987) on chick- 
sexing. Biedennan and Shif&ar (1987) demonstrated that for the skill 
of determining the gender of day-old chicks, training time could be 
drastically reduced by identifying non-accidental disUnguishing 
features. Chick-sexing reportedly takes several years of essfinUally trial 
rmd error practice to achieve high proficiency. By identifying simple 
invariant features. Biederman and Shiffrar were able to reduce these 
years of training to a simple rule for finding a distinguishing contour. 
Although they didn't show why learning was originally so difficult. 
Biederman and Shiffrar hypothesized that the critical distinguishing 
features were obscured by their small size and by being embedded in 
other parts. In such cases, they concluded, it is better to provide 
instruction which points out the features than to hope they will be 
discovered by the learner. 

The same causes of difficulty may apply to the reading of speech 
spectrograms. The context-dependent nature of the speech signal 
causes the visual system to break up the display in inappropriate 
places. AddiUonally. cogniUve processes may be more likely to group 
certain parts together into objects and restrict attenUon to these object 
units (Ceraso, 1985: Kahneman, 1973). This may produce search 
dif^culties if features required to identify one pattern are spread across 
different objects. An otherwise noUceable disting'iishlng feature may be 
difficult to discover because it is in another "part." This hypothesis is 
examined in the experiments which follow. 

The question of interest to the present work is whether the 
difficulty of learning spectrogram reading is produced by context- 
dependent relations among visual features. To enable experimental 
manipulation of the relations of interest, pseudo-spectrograms were 
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used. A computer program generated these pseudo-spectrograms based 
on feature descriptions and interaction rules of real speech 
spectrograms (Zue. unpublished). A general resemblance to actual 
spectrograms was maintained. 

The patterns used in the experiments were composed of two- or 
three-phoneme syllables in a vowel-consonant or consonant-vowel- 
consonant order. EExamples of the pseudo-spectrograms used In 
Experiment 1 are shown in Figure 1. The corxsonants used were the 
stop consonants A>/. /p/» /t/. /k/, /d/, and /g/. The vowels used 
were /i/ as to "beet/' /u/ as in "boot." /ae/ as to "bat." /e/ as in 
"bait." /d/ as to "bought." and /o/ as to "boat." Vowel patterns were 
quite similar to each other and appeared as wide striated areas with 
two dark formants (Fl and F2) and one lighter formant (F3). Vowels 
differed from each other by their width and the height of their three 
formants. 

The purpose of the two experiments described below Is. first, to 
demonstrate that a context-dependent discrimination (i.e.. one whose 
features cross an object boundary) can produce learning difficulty In a 
pseudo-spectrogram reading task; and second, to look at what 
contribution segmentation, as distinguished from other factors such as 
salience, makes to that difficulty. 

Experiment I 

To examine the difficulty of learning a context-dependent 
discrimination, a task was set up to compare the learning of three 
pairs of consonants. These pairs were /b/-/p/. /t/-/k/. and /d/-/g/. 
Because the objective was to look at withto-palr discriminations, 
between-palr discriminations were made simple by giving members of 
the same pair similar widths, but members of different pairs very 
different widths. Hence, /b/ and /p/ were both very thto. /t/ and /k/ 
were both wide and /d/ and /g/ were both of medium vidth. Wlthto- 
pair discriminations were of three types: mult ?le cue. single cue. and 
stogie context-dependent cue. The consonants /b/ and /p/ differed 
from each other m texture, shape, and width, and could be 
disttaguished on any of these dimensions. The consonants /t/ and 
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/k/ could be reliably distinguished only by a single cue. They had the 
same shape, width, and texture, but a different number of formants. 
The consonants /d/ and /g/ could also be distinguished only by a 
single cue. but this cue could not be found by looking at the 
consonant pattern itself. The shape, width, and texture of /d/ and /g/ 
vs^ere identical and the only way to tell them apart was by their 
influence on an ac^acent vowel. All of the consonants, except /g/» 
made the second and third formants of an adjacent vowel curve 
slightly downward at the consonant-vowel boundary. The consonant 
Igi made the second and third formants curve toward each other and 
meet at the consonant-vowel boundary (velar pinch). 

The prediction for the e?q?eriment was that the context- 
(dependent discrimination would be more difficult to learn than either 
the single or multiple cue discriminations. 

Method 

Subjects 

Ten subjects were recruited from the University of Pittsburgh. 
The subjects received credit towards an introductory psychology class 
and $10 for their participation. 

Apporaim 

The pseudo-spectrogram patterns were shown to the subjects on 
the high resolution display screen of a XEROX 1108 computer. 
Subjects responded by using a mouse to make selections from a 
screen menu. The computer collected the subjects' responses and 
provided accuracy feedback to them. 

MoXeridis 

The pseudo-spectrogram patterns were generated by a computer 
program as screen bitmaps. The patterns were 346 X 346 pixels and 
measured 10 cm X 10 cm on the display screen. The phoneme 
patterns were drawn from descriptions which mapp>ed a random 



12 



DifficuHies in Learning 

9 



texture of a particular shade of grgr to different regions of the space 
the pattern was to occupy. The patterns were drawn as lines of these 
small texture patterns, the length of which was predetermined except 
when a line bordered a blank area. In that case the ending point of 
the line was set to a random number within 10 pixels (3 mm) of its 
predetermined ending point. Texture and line-length randomization 
thus provided a small amount of random variability in reappearances 
of the same phoneme. 

The patterns for the phonemes /b/ and /p/ were thin long lines 
of either a more striated (/b/) or more random Up/) texture. For the 
phonemes /t/ and /k/, the patterns were a background of random 
texture with either a single dark area (for /k/} or two dark areas (for 
/t/). Because the descriptions for the background textures of /t/ and 
/k/ were identical, the only reliable way of distinguishing between 
them was the presence of the extra dark area in /t/. The phonemes 
/d/ and /g/ appeared as long striated patterns before a vowel and as 
short striated patterns with two appendages after a vowel, but because 
their descriptions were identical, the only reliable way to distinguish 
between them was by the convergence or lack of convergence of the 
formants in the adjacer*: vowel. Vowel patterns appeared as a striated 
uiiform ackground with two dark lower bars below a lighter bar. 
Vowels could be discriminated by the amount of space between their 
formants. When vowel formants were curved by the presence of an 
adjacent /g/, only the center of the pattern could be used to 
determine the real distance between formants. 

Design 

Subjects participated in four one-hour sessions held on 
consecutive days except for one of the subjects who participated in 
only three sessions but learned all of the discriminations. The 
spectrogram patterns the subjects saw were all possible consonant- 
vowel-consonant combinations of the consonants /b/, /p/. /t/, /d/. 
/g/. /k/, and the vowels /I/, /e/. /ae/, /o/, /u/. The total 
number of different combinations was 216. Half of these "words" (108) 
were used in each session so that after four sessions the subjects saw 
each word pattern only twice. To control for the frequency of seeing 
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each phoneme, the words were blocked Into groups of six in which 
each consonant appeared once in prevocalic and postvocalic form, and 
each vowel appeared once. A subject saw IB such blocks in each 
session. Before each session, the order of the words within each block 
and the order of the blocks within the session were randomized. 

Procedure 

• 

Subjects were tested individually. A subject was seated in front 
of the computer and shown how to use a mouse to choose a letter 
response from a screen menu. The experimenter then briefly explained 
about spectrograms and told the subject that his or her task was to 
leam which letters were represented by each pattern. It was made 
clear, however, that the task was a visual one, and the subjects were 
discouraged from using strat^es based on the sound properties of the 
phonemes, such as stress or pitch. 

When the experiment began, a pseudo-spectrogram pattern 
appeared in the center of the display screen and remained there until 
a response was given. Immediately after the pattern's appearance, the 
message '"Think about your answer..." appeared above the pattern in a 
message box. Because of program differences, three of the subjects 
saw this message on the screen for 20 seconds, while for the 
remaining subjects the message remained on the screen for only 3 
seconds. This difference was not expected to influence the results 
because most responses, especially early in the experiment, required 
more than 20 seconds. Next, a menu appeaifed on the screen along 
with the message "Click on the first sound in the word." The menu 
contained a list of the consonant responses and an example word in 
which the consonant is used. After a subject selected one of the 
consonants, a vowel menu appeared with the message "Click on the 
second sound in the word." Once the vowel was selected, the 
consonant menu reappeared for the third response. After the subject 
made the final response, the program provided feedback. If all three 
responses were correct, the message "That's correct" was displayed in 
the message box. Otherwise, the message "That's wrong" was displayed 
along with the correct answer. The pseudo-spectrogram pattern 
remained on the screen for five seconds after feedback was given. The 
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subjects were allowed to take a short break halfway through the 
session. 

Shortly after the beginning and toward the end of each session, 
the experimenter turned on a tape recorder and aslced the subject to 
continue with the next six trials but describe verbally what he was 
looking at in the pattern and how he decided what to respond. 

Results and Discussion 

A subject was considered to have learned a consonant pair If he 
or she responded correctly to four consecutive trial blocks (8 problems) 
with one allowed error on the third or fourth block. The learning point 
was taken as the first of the four blocks. Not aU of the subjects were 
able to learn all three consonant discriminations within the allotted 
time. Of the 10 subjects. 9 learned the /b/-/p/ distinction. 6 learned 
the /t/-/k/ distinction, and 2 learned the /d/-/g/ distinction. 
McNemar's exact test for correlated proportions indicated that 
significantly more subjects learned the /b/-/p/ distinction than the 
/d/-/g/ distinction (p<.02), but the test of whether more peo^^'e 
learned the /t/-/k/ distinction than learned the /d/-/g/ distinction 
was not significant {p=^. 10). 

A matched pairs sign test was used to test whether the learning 
points for the /b/-/p/ and /t/-/k/ distinctions were earlier than for 
the /d/-/g/ distinction. Unlearned distinctions were considered to 
have a learning point of at least 73 (i.e., one greater than the last 
block). If two distinctions were unlearned, the learning points were 
considered to be tied. Using this procedure, the /b/-/p/ and /t/-/k/ 
distinctions were found to have been learned at an earlier point than 
the /d/-/g/ distinction {p<.01 an<l pc.02 respectively). 

To obtain a measure of how much earlier the single- and 
multiple-cue distinctions were learned, it was necessary for the 
subjects to have learned to distinguish at least two of the three 
consonant pairs. Four subjects failed to meet this criterion and were 
not included in the measure. Of the six remaining subjects, only two 
learned the /d/-/g/ distinction. For the others, the learning point was 
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estimated as 73. Because this value underestimates the true learning 
point, the measure of when the /d/-/g/ distinction was learned is 
conservative. Based on this measure, the mean number of trial blocks 
required for subjects to leam each consonant pair discrimination is 
provided in Table 1. According to these estimates, the /d/-/g/ 
distinction appears to require a considerably greater amount of 
learning time than either the /b/-/p/ or /t/Vk/ distincUons 
(approximately 40 additional blocks). 



Consonant Distinction 



Multiple Cue Smgle Cue Context Cue 
IhtM. ItUM /d/-/g/ 



Mean 20.17 29.17 66.17 

Standard deviation 17.81 23.26 12.17 

Number of estimated 0 0 4 

points 

Table 1: Mean number of trial blocks to reach learning 
criterion for each consonant distinction. 



These results suggest that a context-dependent discrimination 
can be difRcuit to leam. Fewer subjects were able to leam the /d/-/g/ 
discrimination in the allotted time. The test on proportion of learners 
for each distinction showed that significantly more people learned the 
multiple-cue distinction than the context-dependent one. The 
difference between the proportion who learned the single-cue 
distinction and the context-dependent one. though not significant, was 
large (.60 vs .20). For those subjects who did leam the context- 
dependent discrimination (or who were optimistically presumed to be 
about to leam it when the experiment ended), learning took longer 
than for either the multiple cue or the single cue discrimination. 
These findings suggests that having to discover a context-dependent 
discrimination could account for some of the difficulty encountered In 
acquiring the skill of speech spectrogram reading. 
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However, these results must be viewed with caution. The 
e3q)eriment examined learning of a realistic and complex pattern, and 
likely confounded several factors with the context-dependent vs non- 
context-dependent comparison. These factors must be ruled out before 
learning dUOdculty can be unambiguously- assigned to the context- 
dependent maimer in which the stimulus is segmented. One such 
factor is cue salience. It may simply have been harder for the subjects 
to notice the formant curving cue than the other cues. This 
explanation is unlikely given that 8 of the 10 subjects mentioned in 
their verbal reports that there was something unusual about the 
appearance of the formants {i.e.. that they were curved or straight). 
Nevertheless, salience differences must be ruled out. Another 
confounding factor is whether task demands, rather than segmentation 
difBculty. made the /d/-/g/ distinction difRcuU to learn. Subjects may 
have noticed the formant curving cue, but because they also were 
required to learn the identity of the vowel, may have tried to use 
formsuit curving to distinguish among the different vowels. This may 
have "used up" the cue, making it unavailable for use in distinguishing 
the consonants. There is support for this possibility In the verbal 
reports made by several subjects who mentioned the formant curving 
in conjunction with vowel discriminations. These two possible 
alternative explanations are examined in Experiment 2. 

Eaqperiment 2 

In Experiment 2, the goal was to try to determine whether the 
learning difficulty observed in Experiment 1 was due to context- 
dependent segmentation, to some other factor such as salience or task 
demands, or to some interaction of these factors. Segmentation. In this 
context, refers to how the cognitive system divides a pattern into 
objects. Segmentation was mEinlpulated by having two cues occur 
within the same object or by splitting them between two objects. 
Salience Is how noticeable the features are. This was measured by 
having a separate group of subjects circle the parts in the spectrogram 
patterns used in this experiment. It was also controlled for in the 
experimental design by having different groups of subjects learn each 
distinguishing cue both as a between-object cue and as a withln-object 
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cue. Finally, task demands refer to whether the subject was to treat 
the different phonemes as separate parts in making a response. In this 
experiment subjects made only a single response to the whole pattern, 
but an attempt to vary task demands was made through instructional 
bias. 

Method 

Materials 

The pseudo-spectrogram patterns used in Experiment 2 were 
similar to those used In Experiment 1, but to control lor all of the 
independent variables, several changes were made. Ttrst. the- patterns 
consisted of only two phonemes: a vowel-like patieni. followed by a 
consonant-like pattern. The vowel patterns were eithv»r thin fl^ or wide 
CW). and had formants which were either straight (S) oi curved (C) and 
either high (H) or low (L) in frequency (/!/ vs /ae/). Consonant 
patterns could be large (L) or small (S) and had either one (O) or two 
fll formants. Formants appeared as dark spots on the large 
consonants and as protrusions on the small consonants. Figure 2 
shows some examples of these patterns. Hie pseudo-spectrogram 
patterns were generated in the same way as those in Experiment 1; 
the 32 vowel-consonant combinations were drawn 8 times for a total of 
256 patterns. 

To assess the salience of the patterns* visual features, a group 
of 15 subjects (not the same as those in the learning task) were given 
a stack of the 32 different patterns and asked to circle the "important 
parts." The results of this circling task are given in Table 2. Of 
relevance to the present experiment is the finding that the subjects 
circled the vowel formants an average of 98% of the time, while 
circling the consonant formants an average of only 76% of the time. 
Furthermore, the subjects tended to circle curved vowel formants as a 
single part (67% of the time), and straight vowel formants as separate 
parts (83% of the time). The first consonant formant was circled more 
often than the second (81% vs 68%). and formants in the large 
consonants were circled more often than formants in the small 
consonants (90% vs 59%). Hence, some of the difference in salience 
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between vowd formants and consonant formants may be due to 
difHcuIty seeing the small consonant formants as distinct parts. 



Feature Proportion 

Whole Vowel .13 

1st Vowel Formant .97 

2nd Vowel Formant ,99 

3rd Vowd Formant .98 

All Other Vowel Features .22 

Whole Consonant .33 

1st Consonant Formant .83 

2nd Consonant Formant .69 

All Other Consonant Features .29 



Table 2: Proportion of tim^ a feature was circled in part- 
circling task. 



Design 

The goal of the experiment was to assess whether a 
within-object cue would be learned more readily than a between-object 
cue. To avoid confounding the type of cue (formant curving or number 
of formants) with the location of the cue {within or between objects), 
each cue type was learned as both a within-object cue and as a 
between-object cue. Because this could not be manipulated within 
subjects, an incomplete blocks design was used. Each subject provided 
two observations from the 2 X 2 (Cue Type X Cue Location) design, 
and a block of two subjects with complementary conditions constituted 
a sin^e replication of the design. This confounds the Cue Type X Cue 
Location interaction with subjects* but by running enough replications, 
this effect could be analyzed as a between block factor. 

One additional factor. Instruct on. was also included as a 
between block factor. One half of the blocks received neutral 
instructions which asked them to learn to associate the whole pattern 
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with a response, the other half received biasing instructions which 
asked them to learn the half of the pattern containing the 
within-object cue. The within and between block designs made up four 
conditions: Neutral InstrucUons. Cunre-Withln (NCW): Neutral 
Instructions* Curve-Between (NCB); Biased Instructions. Curve-Within 
(BCW); and Biased Instructions. Curve-Between (BCB). The 
Curve-Within/Curve-Between distinction refers to the type of rules 
subjects were to learn. Table 3 shows these rules for each condiUon. 



Cons. 


CondltloKi 


Left Pattern 


Right Pattern 




(Instructions- 








v^uxve locationj 








Neutml-Withln 


K^iXl VCU« AIJULlI 




/d/ 


fNCWl 






/k/ 




Wide 


One formant 


/t/ 




Wide 


Two formants 


/g/ 


Neutral-Between 


Cunred 


Small 


/d/ 


(NCB) 


Straight 


Small 


/k/ 






Large and One formant 


It/ 






La.'ge and Two formants 


l%l 


Biased-Withln 


Curved.Thin 




/d/ 


(BCW) 


Stralght.Thin 




/k/ 




Wide 


One formant 


/t/ 




Wide 


Two formants 


/g/ 


Biased-Between 


Curved 


Small 


/d/ 


(BCB) 


Straight 


Small 


/k/ 






Large and One formant 


/t/ 






Large and Two formants 



Table 3: Rules for discriminating patterns In Experiment 2 
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The Curve-Within groups learned the formant curving cue as a 
within-object cue and the number of formants cue as a between-object 
cue; the Curve-Between i^ups learned the formant curving cue as a 
between-object cue and the number of formants cue as a within-object 
cue. 

Subjects participated In a single two hour session. The 
pseudo-spectrogram patterns the subjects saw were all possible 
vowel-consonant combinations as described above. To control for the 
frequency of seeing each phoneme, the patterns were grouped into 
blocks of eight in which ^ch consonant apptired twice and each 
vowel appeared once. Before ^ch session, the order of the patterns 
within each block, and the order of the blocks within the session were 
randomized for each subject. 

Procedure 

Subjects were tested individually. Each subject was seated in 
front of the computer and shown how to use a mouse to choose a 
letter response irom a screen menu. Then the instructions for the 
experiment were displayed on the screen. Subjects in the neutral 
conditions were told their task was to learn to identify which pattern 
was displayed: subjects in the biased conditions were told to Identify 
the left (or right) pattern. To ensure that the subjects in the biased 
condition read the instructions, they were asked to identify which half 
(left or right) of the pattern they were to learn. If they were Incorrect, 
the instructions reappeared on the screen. 

The experiment began with a pseudo-spectrogram pattern 
appearing in the center of the display screen. The message 'Think 
about your answer..." appeared in a message box above the pattern for 
3 seconds. Then a menu appeared on the screen along with the 
message "CUck on the first sound In the word." The menu contained a 
list of four responses: /t/, /k/, /d/, and /g/. After the subject made a 
response, the program provided feedback. If the response was correct, 
the message 'That's correct" was displayed in the message box. 
Otherwise, the message "That's wrong* was displayed along with the 
correct answer. Once feedback was given, the pseudo-spectrogram 
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pattern remained on the screen for 10 seconds before being replaced 
by the pattern for the next trial. Every 32 trials, the subject was 
allowed to take a short break before continuing. 

After the session, the experimenter turned on a tape recorder 
and asked the subject to Identify 8 patterns and describe what she 
looked at in the pattern and how she decided what to respond. 
Subjects 

Forty-ei^t introductory psychology students from the University 
of Pittsburgh participated for course credit. Two subjects, both from 
the Neutral-Curve-Between condition, were replaced: one quit the 
session earty. the other hadn't slept for 48 hours prior to the 
experiment session and showed no learning. The remaining subjects 
were randomly assigned to the four conditions with the constraint of 
obtaining 9 full or partial learners (as described below) in each 
condition. 

Results and Discussion 

As in Experiment 1, subjects had considerable diflicully learning 
both the within and between object distinctions. A subject was 
considered to have learned a distinction when correct responses were 
made on two consecutive blocks (8 problems) with one allowed error 
on the second block {two subjects were also considered to have learned 
a distinction on their final block if the final block was correct and they 
gave the correct rule for the distinction in their post-session interview). 
By this criterion, the 48 subjects fall into three categories: full 
learners, non-learners, and partial learners. Full learners were those 
who learned both the between and within object distinctions: 
non-leamers learned neither distinction: partial learners were those 
who only learned one of the two distinctions. Table 4 summarizes how 
the subjects performed. Ei^teen subjects were full learners, twelve 
were non-leamers. and eighteen were partial learners. Of the partial 
learners. 13 learned only the within rule and 5 learned only the 
between rule. Of the non-leamers, one was from the NCB condition, 
two from the BCW condition, and nine from the NCW condition. 
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Discriminations learned 



prcw 



NCB 



BCW 



BCB 



Both discrixDinations 
One discrimination 
^R^thin rule only 
Between rule only 



5 
0 
9 



4 



9 



0 
0 
I 



7 
1 
2 



1 



1 
4 
0 



4 



Neither discrimination 



Table 4: Discriminations learned, by condition 



A matched pairs sign test was used to test the main eiTects of 
Cue Location and Cue lype for those subjects who were full or partial 
l^utjers. For partial learners* the learning point of the unlearned 
distinction was considered to be at least 17 (the last trial block plus 
one). By this test, the main effect of Cue Location was not significant 
(ztol.39. p<.09). but the main effect of Cue lype was significant 
(z!s=4.l8, p<.001). The subjects learned the fonnant curving cue before 
the number of fonnants cue significantly more often than they learned 
them in the reverse order. To test the interaction of Cue Type X Cue 
Location, each subjects performance was categorized according to its 
sign. A chi-square test of independence revealed that the interaction 
was significant (x'(2)= 19.35, p<.001). Formant curving was learned first 
as a withih-object cue Just as often as it was learned first as a 
between-objects cue. but the number of fonnants cue was learned first 
as a within-object cue more often than as a between-objects cue. 

To obtain a measure of when the distinctions were learned, the 
learning point for unlearned distinctions was estimated as the 17th 
block. This value underestimates the true learning block and makes 
the measure conservative. Most of these estimations were made for the 
between-object distinction when It involved the number of formants 
cue. This is also consistent with the observation that an unusually 
large number of non-learners were found in the conditions which 
required learning this distinction (the NCW and BCW condiUons). 
Making these estimations, the mean learning block for each distincUon 
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and condition was calculated. These values are given in Table 5. The 
measures indicate that the number of fonnants cue was learned at 
least ave blocks earlier as a wtthin-object cue than as a between-object 
cue. but the formant curving cue was learned at about the same point 
for both locations. 

Cue Localion 

Cue Type 





Mthin 


Between 


Number of fonnants 






Mean 


10.28 


15.56 


Standard deviation 


5.04 


2.59 


Number of estimated points 


4 


12 


Fonnant curving 






Mean 


8.72 


7.44 


Standard deviation 


4.23 


3.75 


Number of estimated points 


1 


1 



Table 5: Mean number of trial blocks to reach learning criterion for each consonant distinction 



These results suggest that lack of salience may play an 
important part in making this type of sklU difficult to learn. The sign 
test demonstrates that the formant curving cue was more often learned 
before the ntmxber of fonnants cue. and the estimates of learning 
points shows that the formant curving cue was learned at least 4 
blocks earlier, on average. The cause of this diiference is likely to be 
cue salience. In the part circling task, more subjects circled the vowel 
fonnants than the consonant fonnants. suggesting that the vowel 
formants are more salient. The effect of salience, however, does not 
explain the learning difficulty observed in the first experiment. In 
Experiment 1, number of formants as a wlthm-object cue was learned 
sooner and more often than the formant curving cue as a between- 
object cue. If this were due to salience, then we should have found 
that the number of formants cue was learned sooner than the formant 
curving cue in Experiment 2. 
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Nor can segmentaUon by itself account for the observed learning 
difficulty. Cue Location was not slgjiiflcant, and even the interaction of 
Cue TVpe and Cue Location does not produce a simple explanation. 
Context-dependent segmentation does appear to produce learning 
difficvilty. but this effect may be restricted to cues of lower salience. 
Ihe chi-square test on the interaction of Cue Type and Cue Location 
showed that more subjects learned the formant curving cue before the 
number of fonnants cue when the number of formants cue was a 
between-objects cue, but when the number of formants cue was a 
within-objects cue» the order of learning was indtiferent to cue type. 
Thus, difficulty due to cue location was found for the less salient 
number of formants cue but not for the more salient formant curving 
cue. However, the degree of impairment for less salient cues appears 
to be substantial. More non-learners (11 vs 1) and within-rule-only 
learners (12 vs 1) were reported in the conditions which required 
learning the number of formants cue as a between-objects cue. 
Additionally, the conservative ^timate of learning points indicates that 
this cue was learned at least five blocks later as a between- than as a 
within-object cue. 

Yet segmentation does not explain the learning difficulty 
observed In the first experiment. In Experiment 1. the formant curving 
cue as a between-object cue was found to be much harder to learn 
than the number of formants cue as a within-object cue. This finding 
was not replicated in the second ejcperiment. In fact, the opposite was 
found. Neither salience nor segmentation can account for this 
difference because neither was changed between the two experiments. 
The only major c^^ange was the learning task. 

Presumably, the reason the formant curving cue was difficult to 
learn in Experiment 1 was the vowel response required in that task. 
This was not manipulated in the second experiment, so It is impossible 
to be certain. It Is interesting to note, however, that the difficulty 
disappeared when tb-j vowel identification task was eliminated in 
Experiment 2. Unfortunately, the manipulation of Instructional bias in 
this experiment was too weak to clarify this question. Half of the 
subjects were instructed to "learn to Identify the right [or "left"! hand 
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part" of the pattern, but in post-experiment interviews several admitted 
to ignoring these instructions. Instnictional bias did not significantly 
interact with either Cue Type or Cue Location (x*(2)a3.31, p>AO, 
X*(2)=:3.82, p>.10. respectively). Future research should determine 
whether task demands cause the difBculty observed in Experiment 1 
by more strongly manipulating task demands within a single 
experiment. 

General Discussion 

The two experiments presented here point to several factors 
which can affect the difHculty of learning to read speech spectrograms. 
The original hypothesis, that learning difficulty was caused by context- 
dependent relations created by the way the visual S3rstem segments 
spectrogram patterns, has been shown to be too simple. Learning 
difficulty for this skill may be affected by the interaction of 
segmentation with cue salience and task demands. Segmentation was 
shown to have a considerable influence on difficulty, but this influence 
may be restricted to less salient cues. Segmentation may also be 
influenced by the demands of the learning task. Although the 
experiments did not demonstrate this, it Is likely that the type of 
response required by the learning task influences task difficulty. The 
following discussion examines tn more detail why segmentation might 
interact with these factors. 

The interaction of segmentation with cue salience can be 
explained by assuming that whatever learning difficulties are produced 
by segmentation can be overcome by a highly salient cue. Salience 
has long been known to influence hypothesis selection in 
discrimination learning tasks fTtabasso & Bower, 1968). Highly salient 
cues are likely to be tried first as hypotheses. If the effect of 
segmentation is to make certain cues less available for selection as 
hypotheses, then it is easy to understand why a high degree of 
salience would overcome this effect. This explanation is supported by 
the results of the second experiment reported here, in which the mean 
learning block was about the same for all distinctions except for the 
condition when the less salient number of formants cue was a 
between-objects cue. When the formant curving cue was a between- 
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objects cue, its highly salient nature made it available for attention 
anyway. 

Although neither experiment directly mianipulated task demands, 
the difference between the results of the two experiments suggests that 
the type of response the subjects were required to give was also 
important. In the first experiment, where the subjects were required to 
respond to both consonants and vowels, th^ had dilBculty learning 
the highly salient formant curving cue as a between object cue. In the 
second experiment, where subjects made only a single response to the 
whole pattern, formant curving was no more difficult to learn as a 
between-object cue than as a within-object cue. Since subjects in 
Experiment 1 reported using formant curving to distinguish the vowel 
responses, it seems likely that including the vowel response made it 
more difficult to notice the relevance of the formant curving to the 
consonant distinction, perhaps In the following way. A subject might 
select the cue as a hjrpothesis for vowel identification. When this 
hypothesis was disconfirmed, the hypothesis may have become less 
likely to be selected immediately again. If the formant curving cue was 
selected as relevant for vowel discrimination because of the way that 
spectrograms are segmented visually, it might be less available for part 
of a consonant discrimination. In the second experiment, when the 
vowel identification task was eliminated, subjects were more able to 
learn formant curving as a between object cue. 

Task demands may also have increased learning difficulty by 
reinforcing any existing segmentation biases. If subjects were required 
to make two responses to a pattern, they may have been more likely to 
see the pattern as two distinct parts, and possibly to assign one 
response to one part, and the other response to the remaining part. 
This may have enhanced any existing bias against crossing part 
boundaries. This hypothesis can be tested only by future research. 

The main conclusion of the present research Is to confirm the 
Infiuence of segmentation on learning difficulty in speech spectrogram 
reading. Althou^ segmentation was not found to be the sole 
determiner of such difficulty, in combination with other stimulus and 
task variables it appeared to have a substantial infiuence. One way of 



Difficulties in Learning 

24 



thinking about the effect of segmentation is as a within-object search 
bias. People may be biased toward searching within an object's part 
boundaries (contour) for discriminating features, before considering 
features outside those boundaries. This bias, however, can be over- 
ridden by a highly salient feature in another part. The learning task 
is also important to the within-object search bias. If a feature can be 
used as a within-object cue, then it may be less likely to be considered 
as a between-object cue. Such factors may have led the subjects in 
Experiment 1 to believe incorrectly that formant curving indicated 
vowel identity, and may have Impaired their ability to associate it with 
consonant identity. 

The existence of a within-object search bias is consistent with 
several theories of visual attention. According to the view taken by 
Kahneman {1973: Kahneman & Henik. 1981) and Ceraso (1985). 
attention to a visual scene is allocated by object units. According to 
Kahneman's (1973) model of attention and perception, preattentive 
visual processes divide a display into units according to stimulus 
properties and simple grouping rules (such as Gestalt rules). These 
units are given flgural emphasis (attention) based on factors such as 
ilgure-ground relations, features which make something STAND OUT. 
and intention. Units which receive this attention are then matched 
against memory structures to test for recognition. Visual search 
involves the intentional switching of flgural emphasis from object to 
object, or the attraction of flgural emphasis based on a feature (either 
stimulus or response selected) which distinguishes the target. 
According to the results of the experiments presented above, the 
features of a target phoneme unit are more likely to be considered 
than features of other phonemes, unless those other features are 
highly salient. This result may be due to the way attention is allocated 
to a whole part unit. If whole phonemes are attended as wholes, then 
the features within the attended phoneme will receive flgural emphasis 
and be further processed as potential hypotheses. However, if a highly 
salient feature, one which draws attention to itself, is in a neighboring 
phoneme, it may be Included in processing and may even be selected 
earlier as a hypothesis. According to this attention-by-parts view, the 
within-object search bias may be the result of normal attention 
allocation policy within the visual system. 
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A withln-object search bias is also consistent with recent 
suggestions that preferences and heuristics are reqtiired to restrict the 
amount of search involved in concept learning (Michaiski, 1983; 
Medin. Wattenmaker & Michaiski. 1987). this view is not inconsistent 
with the attention-by-parts hjrpothesis, but emphasizes the functional 
role of such a bias in the learning proems. In complex visual 
environments, ordered search for important features (even salience 
ordered search) is too resource consuming to be viable. Rather, 
preferences for certain features or locations are required to restrict the 
scope of search. Restricting the search for a discriminating feature to 
the area within the object boundaries of a part is a sensible heuristic. 
In our normal visual perception, objects are classified or discriminated 
-by features within their own object boundaries. Only in certain 
artificial environments, such as speech spectrograms or x-ray pictures, 
are context-dependent relations set up by visual segmentation. In such 
environments, what is normally a useful heuristic actually hinders 
search rather than aiding it. 

In the second experiment, what was observed was not a 
facilitating effect for a within-object cue. but an increased difflculty for 
locating a between-object cue. Cues with low salience can be fairly 
easily .located when they are within the same object, but when a low- 
salience cue must be found in a nearby object, learning difilculty is 
increased, probably by a tendency to retry discarded within-object 
h3^theses. This result has important implications for speech 
spectrogram reading. First, it explains at least part of the enormous 
difBculty in learning the skill of speech spectrogram reading. In 
spectrogram reading, the large variability in the appearance of 
phonemes means that the salience of most features is likely to be 
qtdte low. Also, it is important to learn spectrogram patterns at Ihe 
individual phoneme level. Hence, the narrow focus induced by the task 
should be expected to increase the within-object search bias and 
impair discovery of context-dependent features. 

Some individuals, too, might be more affected by a search bias 
than others. For some, it may only slow down search, with the low- 
salience context-dependent feature found only after within-object 
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features have been searched. For others, it may mean the complete 
abandonment of search after a within-object search has failed. Such 
difference depotid on an individuars repertoire of strategies and 
learning history. Fortunately for students of spectrogram reading. 
Victor Zue has Identlfled many of these features, so they do not have 
to be discovered anew. 

In most visual environments and for most perceptual skills, a 
wlthin-object bias is helpful. It restricts the amount of search required 
for learning. However, for other environments and skills, such as 
speech spectrogram reading, radiology, and passive sonar reading, 
where visual objects and real objects do not directly correspond 
{Lesgold et al, 1988: Liberman et al, 1968; Smith, 1982), it becomes a 
source of learning difHculty. Overcoming such search biases may be an 
important part of learning for these skills. 
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